This disclosure relates generally to online systems, and more specifically to processing data received at a data processing system of an online system.
Online systems, such as social networking systems, allow users to connect to and to communicate with other users of an online system. Users may create profiles on an online system that are tied to their identities and include information about the users, such as interests and demographic information. The users may be individuals or entities such as corporations or charities. Online systems allow users to easily communicate and to share content with other online system users by providing content to an online system for presentation to other users. Content provided to an online system by a user may be declarative information provided by a user, status updates, check-ins to locations, images, photographs, videos, text data, or any other information a user wishes to share with additional users of the online system. An online system may also generate content for presentation to a user, such as content describing actions taken by other users on the online system.
Additionally, many online systems commonly allow publishing users (e.g., businesses) to sponsor presentation of content on an online system to gain public attention for a publishing user's products or services or to persuade other users to take an action regarding the publishing user's products or services. Content for which the online system receives compensation in exchange for presenting to users is referred to as “sponsored content.” Many online systems receive compensation from a publishing user for presenting online system users with certain types of sponsored content provided by the user. Frequently, online systems charge a publishing user for each presentation of sponsored content to an online system user or for each interaction with sponsored content by an online system user. For example, an online system receives compensation from a publishing user each time a content item provided by the publishing user is displayed to another user on the online system or each time another user is presented with a content item on the online system and interacts with the content item (e.g., selects a link included in the content item), or each time another user performs one or more particular actions after being presented with the content item (e.g., visits a website or physical location associated with the user who provided the content item).
An online system that provides content to its users in exchange for compensation from a user (i.e., sponsored content) may provide a publishing user who provided content to the online system with various metrics describing certain actions performed by other users of the online system after being presented with such sponsored content to describe the effectiveness of the sponsored content at eliciting the certain actions. For example, an online system presents users with a content item and maintains a number of users who select a link included in the content item or a number of times the users visit a website associated with the content item during a particular time interval based on information received from client devices on which users interact with the content item. Based on the number of users who selected a link included in the content item or a number of times the users visited the website associated with the content item after being presented with the content item, the online system determines a metric and includes the metric in a report describing the content item's effectiveness that is provided to a publishing user associated with the content item.
Determining metrics describing actions performed by users of an online system often involves performing complex, resource-intensive operations on large amounts of data in short periods of time to extract, analyze and process information to provide meaningful reports. For example, to generate metrics describing events associated with various content items presented at different time intervals by an online system, the online system quickly receives, formats, analyzes, organizes, and presents the required information to generate metrics for various content items. To efficiently process the significant amount of information required to generate various metrics, online systems often use data processing pipelines capable of processing an incoming stream of data in a short amount of time. For example, a data processing pipeline distributes operations among various components of the data processing pipeline to quickly process the incoming stream of data. A data processing pipeline may include components operating on different computing devices and in different locations in various implementations.
However, in some circumstances, data received at a data processing pipeline may be lost in the pipeline before or during processing, causing inaccurate determination of metrics. For example, a data processing pipeline having multiple components each performing a specified process on individual pieces of data as they move through the pipeline loses a piece of data (e.g., a component fails to process the piece of data due to power failure or logic error) during processing. If data is lost in the pipeline and the loss is not detected, metrics based on the lost data may be incomplete or inaccurate. For example, if the online system performs a series of additive operations on data being processed through a data processing pipeline to measure a number of times a user interacts with a particular content item after being presented with the content item, the measurement is inaccurately low if data describing user interactions with the content item are lost in the data processing pipeline during processing. Accordingly, metrics based on data lost in the data processing pipeline will also be inaccurately low if the online system does not detect and correct for the lost data when determining the metrics. Hence, undetected loss of data in a data processing pipeline at the online system may cause an online system to generate metrics that inaccurately describe performance of various content items presented on the online system.